AITopics | margin condition

Collaborating Authors

margin condition

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Near-Exponential Convergence Rates for kNN Classification based on Boltzmann Margin

Yang, Luyuan, Shafaei, Shayan, Lan, Chao

arXiv.org Machine LearningJun-10-2026

Convergence-rate analysis for classifiers is often conducted under either Tsybakov margin or Massart margin. The former is a relatively weak condition that typically yields polynomial rates, while the latter is substantially stronger but can guarantee exponential rates. In this paper, we introduce a new condition, called Boltzmann margin, that bridges the gap between these two regimes. It is weaker than Massart margin, generally stronger than Tsybakov margin, and can imply many of their properties under suitable conditions. We apply Boltzmann Figure 1: Example data densities on [0,1] that satisfy different margins respectively. Bayes decision boundary is 0.5.margin to the analysis of kNN classifiers and establish the first near-exponential convergence rates for kNN classification. We also present extensions of the main results and provide numerical evidencenecessarily strong for many problems. Can there be a more supporting the main theoretical implications.

artificial intelligence, boltzmann margin, machine learning, (19 more...)

arXiv.org Machine Learning

2606.10361

Country: North America > United States > Oklahoma (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

sup

Neural Information Processing SystemsApr-25-2026, 07:43:39 GMT

A.1 Notation In this appendix, we use the notation dπt(,) to indicate the state-action visitation measure induced by the policy π at time t. We overload the notation dπt() to denote the state-visitation measure induced by the policy π at time t. Likewise, the notations dDt (,) and dDt () indicate the empirical visitation measures in the dataset D. For a function g: X R, the norm kgk, supx X |g(x)|. Before discussing the proofs of the results, we also explain the instantiation of the function class in the tabular setting below. A.2 Imitation gap upper bound on empirical moment matching (Theorem 3.1) Below we restate Theorem 3.1 and provide a proof of this result. The key observation is that since the learner πMM best matches the empirical distribution in the dataset, which is in turn close to the population visitation measure induced by πE, we can expect the visitation measure induced by πE and πMM to be close. This in turns implies that both policies will collect a similar value under any reward function. Precisely characterizing the rates at which these distributions converge to one another results in the final bound. Consider the empirical moment matching learner πMM (eq. TV dπt,dDt (20) where the equation follows by the variational definition of the total variation distance, and where dπt is the state-action visitation measure induced by πE and dDt is the empirical state-action visitation measure in the dataset D. The imitation gap of this policy can be upper bounded by, J(πE) J(πMM) = EπE "H This goes to show that in the tabular setting, MMis equivalent to finding the policy which best matches (in TV-distance) the empirical state-action distribution observed in the dataset.

artificial intelligence, machine learning, nexp, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

2e809adc337594e0fee330a64acbb982-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-19-2026, 00:15:01 GMT

exp, learner, probability 1, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Do More Predictions Improve Statistical Inference? Filtered Prediction-Powered Inference

Xu, Shirong, Sun, Will Wei

arXiv.org Machine LearningFeb-12-2026

Recent advances in artificial intelligence have enabled the generation of large-scale, low-cost predictions with increasingly high fidelity. As a result, the primary challenge in statistical inference has shifted from data scarcity to data reliability. Prediction-powered inference methods seek to exploit such predictions to improve efficiency when labeled data are limited. However, existing approaches implicitly adopt a use-all philosophy, under which incorporating more predictions is presumed to improve inference. When prediction quality is heterogeneous, this assumption can fail, and indiscriminate use of unlabeled data may dilute informative signals and degrade inferential accuracy. In this paper, we propose Filtered Prediction-Powered Inference (FPPI), a framework that selectively incorporates predictions by identifying a data-adaptive filtered region in which predictions are informative for inference. We show that this region can be consistently estimated under a margin condition, achieving fast rates of convergence. By restricting the prediction-powered correction to the estimated filtered region, FPPI adaptively mitigates the impact of biased or noisy predictions. We establish that FPPI attains strictly improved asymptotic efficiency compared with existing prediction-powered inference methods. Numerical studies and a real-data application to large language model evaluation demonstrate that FPPI substantially reduces reliance on expensive labels by selectively leveraging reliable predictions, yielding accurate inference even in the presence of heterogeneous prediction quality.

large language model, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

2602.10464

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
Oceania > New Zealand (0.04)
Asia > China > Fujian Province > Xiamen (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.66)

Add feedback

Don't Eliminate Cut: Exponential Separations in LLM-Based Theorem Proving

Sonoda, Sho, Akiyama, Shunta, Uezato, Yuya

arXiv.org Machine LearningFeb-12-2026

We develop a theoretical analysis of LLM-guided formal theorem proving in interactive proof assistants (e.g., Lean) by modeling tactic proposal as a stochastic policy in a finite-horizon deterministic MDP. To capture modern representation learning, we treat the state and action spaces as general compact metric spaces and assume Lipschitz policies. To explain the gap between worst-case hardness and empirical success, we introduce problem distributions generated by a reference policy $q$, including a latent-variable model in which proofs exhibit reusable cut/lemma/sketch structure represented by a proof DAG. Under a top-$k$ search protocol and Tsybakov-type margin conditions, we derive lower bounds on finite-horizon success probability that decompose into search and learning terms, with learning controlled by sequential Rademacher/covering complexity. Our main separation result shows that when cut elimination expands a DAG of depth $D$ into a cut-free tree of size $Ω(Λ^D)$ while the cut-aware hierarchical process has size $O(λ^D)$ with $λ\llΛ$, a flat (cut-free) learner provably requires exponentially more data than a cut-aware hierarchical learner. This provides a principled justification for subgoal decomposition in recent agentic theorem provers.

artificial intelligence, logic & formal reasoning, machine learning, (21 more...)

arXiv.org Machine Learning

2602.10512

Country:

North America > United States > New York (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)

Add feedback

5d2c2cee8ab0b9a36bd1ed7196bd6c4a-Supplemental.pdf

Neural Information Processing SystemsFeb-8-2026, 21:45:12 GMT

algorithm, equation, probability, (16 more...)

Neural Information Processing Systems

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.50)
Information Technology > Data Science > Data Mining > Big Data (0.47)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.40)

Add feedback

240d297094fc76d1e7aa27b01f221b00-Paper-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 22:03:26 GMT

classification, classifier, precision, (16 more...)

Neural Information Processing Systems

Country:

Europe > Poland > Greater Poland Province > Poznań (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(2 more...)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.96)

Add feedback

A Smoothed Analysis of the Greedy Algorithm for the Linear Contextual Bandit Problem

Sampath Kannan, Jamie H. Morgenstern, Aaron Roth, Bo Waggoner, Zhiwei Steven Wu

Neural Information Processing SystemsNov-20-2025, 23:23:49 GMT

We give a smoothed analysis, showing that even when contexts may be chosen by an adversary, small perturbations of the adversary's choices suffice for the algorithm to achieve "no regret", perhaps (depending on the specifics of the setting) with a constant amount of initial training data.

artificial intelligence, data mining, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > Minnesota (0.04)
North America > Canada > Quebec > Montreal (0.04)

Industry: Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.42)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsOct-2-2025, 23:53:55 GMT

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper looks at differentially private algorithms for a generic maximization problem (private argmax might be a good name). Given a collection of K of items, and a data set D of n individuals, and a score function f that assigns each item i a data-based score f(i;D), the goal is to find an item i with approximately maximal score, while preserving differential privacy. This private argmax has proven to be a fundamental problem in the theory of private data analysis. It was first formulated by McSherry and Talwar (2007), who proposed the exponential mechanism to solve it.

algorithm, exponential mechanism, mechanism, (11 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.05)

Genre: Overview (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.71)

Add feedback